{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Modeling and Simulation in Python\n",
    "\n",
    "Chapter 10\n",
    "\n",
    "Copyright 2017 Allen Downey\n",
    "\n",
    "License: [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Configure Jupyter so figures appear in the notebook\n",
    "%matplotlib inline\n",
    "\n",
    "# Configure Jupyter to display the assigned value after an assignment\n",
    "%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'\n",
    "\n",
    "# import functions from the modsim.py module\n",
    "from modsim import *\n",
    "\n",
    "from pandas import read_html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Under the hood\n",
    "\n",
    "To get a `DataFrame` and a `Series`, I'll read the world population data and select a column.\n",
    "\n",
    "`DataFrame` and `Series` contain a variable called `shape` that indicates the number of rows and columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "filename = 'data/World_population_estimates.html'\n",
    "tables = read_html(filename, header=0, index_col=0, decimal='M')\n",
    "table2 = tables[2]\n",
    "table2.columns = ['census', 'prb', 'un', 'maddison', \n",
    "                  'hyde', 'tanton', 'biraben', 'mj', \n",
    "                  'thomlinson', 'durand', 'clark']\n",
    "table2.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "census = table2.census / 1e9\n",
    "census.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "un = table2.un / 1e9\n",
    "un.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A `DataFrame` contains `index`, which labels the rows.  It is an `Int64Index`, which is similar to a NumPy array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "table2.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And `columns`, which labels the columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "table2.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And `values`, which is an array of values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "table2.values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A `Series` does not have `columns`, but it does have `name`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "census.name"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It contains `values`, which is an array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "census.values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And it contains `index`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "census.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you ever wonder what kind of object a variable refers to, you can use the `type` function.  The result indicates what type the object is, and the module where that type is defined.\n",
    "\n",
    "`DataFrame`, `Int64Index`, `Index`, and `Series` are defined by Pandas.\n",
    "\n",
    "`ndarray` is defined by NumPy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(table2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(table2.index)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(table2.columns)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(table2.values)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(census)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(census.index)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "type(census.values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optional exercise\n",
    "\n",
    "The following exercise provides a chance to practice what you have learned so far, and maybe develop a different growth model.  If you feel comfortable with what we have done so far, you might want to give it a try.\n",
    "\n",
    "**Optional Exercise:** On the Wikipedia page about world population estimates, the first table contains estimates for prehistoric populations.  The following cells process this table and plot some of the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "filename = 'data/World_population_estimates.html'\n",
    "tables = read_html(filename, header=0, index_col=0, decimal='M')\n",
    "len(tables)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Select `tables[1]`, which is the second table on the page."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "table1 = tables[1]\n",
    "table1.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Not all agencies and researchers provided estimates for the same dates.  Again `NaN` is the special value that indicates missing data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "table1.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, we'll replace the long column names with more convenient abbreviations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "table1.columns = ['PRB', 'UN', 'Maddison', 'HYDE', 'Tanton', \n",
    "                  'Biraben', 'McEvedy & Jones', 'Thomlinson', 'Durand', 'Clark']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some of the estimates are in a form Pandas doesn't recognize as numbers, but we can coerce them to be numeric."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "for col in table1.columns:\n",
    "    table1[col] = pd.to_numeric(table1[col], errors='coerce')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are the results.  Notice that we are working in millions now, not billions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "table1.plot()\n",
    "decorate(xlim=[-10000, 2000], xlabel='Year', \n",
    "         ylabel='World population (millions)',\n",
    "         title='Prehistoric population estimates')\n",
    "plt.legend(fontsize='small');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use `xlim` to zoom in on everything after Year 0."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "table1.plot()\n",
    "decorate(xlim=[0, 2000], xlabel='Year', \n",
    "         ylabel='World population (millions)',\n",
    "         title='CE population estimates')\n",
    "plt.legend(fontsize='small');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See if you can find a model that fits these data well from Year 0 to 1950.\n",
    "\n",
    "How well does your best model predict actual population growth from 1950 to the present?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution goes here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
